affinity propagation: add comparison test with sklearn implementation #132

xiuliren · 2018-11-19T21:04:32Z

see issue:
#131

xiuliren · 2018-11-19T21:37:43Z

this implementation seems more clear:
http://www.sc.ehu.es/ccwbayes/members/rsantana/software/matlab/Mateda2.0/otherfiles/apcluster.html
https://www.ritchievink.com/blog/2018/05/18/algorithm-breakdown-affinity-propagation/
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/affinity_propagation_.py

I'll try to reimplement it if I find some time.

alyst · 2018-11-19T23:24:55Z

Thanks for your work! Am I right that by "python comparison" you mean sklearn implementation?

I'll try to reimplement it if I find some time.

You can try it, but keep in mind that there should be clear benefits for the package to switch to it:

correctness (produce correct results where the current one fails)
performance (be faster or use less memory than the current implementation)

Actually, you would already need to have some tests, when/if you will be writing your implementation of the method.

The current state of affprop tests in master is obsviously unsatisfactory, because there are no tests for the correctness of generated clusters.
What I had in mind was some series of very simple tests:

one 2D point case
two 2D points cases (distance below or above the tolerance)
three 2D points case
10 2D points case

In all these test cases the point coordinates should be fixed (not randomly generated), so the "human eye" can easily see the expected result.
If one of these simple tests fails, it should be relatively easy to find the cause.

The current test of 500 random points in 10 dimensions should come the last.
I have some questions to your comparison with sklearn results:

how did you generate python_assignments? Have you saved the Julian S matrix and loaded it into Python?
the clusters generated by "python" and affprop() could have different labels (assignments), e.g. the very same set of elements may be in cluster #11 in "python" and in cluster #95 in "Julia". Do you check for that?

xiuliren · 2018-11-20T00:25:06Z

good point, I just tried computing rand index for validation. Current implementation is correct. We can still merge this PR to keep the test though?

here is what I use for python implementation:

using PyCall

@pyimport sklearn.cluster as cl
af = cl.AffinityPropagation(affinity="precomputed")[:fit](similarityMatrix)

labels = af[:labels_] .+ 1

alyst · 2018-11-20T01:18:10Z

@jingpengw Thanks for checking this! I think it makes sense to merge with minor corrections:

Include your PyCall script as a block comment (#= =#) before python_assignments with the comment that this is how the vector of assignments is generated: if later the test is modified, we would know how to update the expected output
Rename pythons_assignments to ref_assignments, remove labels var; I guess it's not used/needed.
Split the ref_assignments vector into several lines not exceeding 80 chars
Remove all println() calls: they generate noise in CI, and if the randindex() test will fail, there would be actual-expected error log anyway

xiuliren · 2018-11-20T03:43:02Z

updated, please take a look.

add comparison with python results to illustrate the bug: #131

248b886

xiuliren changed the title ~~add comparison with python results to illustrate the difference with python~~ add comparison test with python affinity propergation Nov 19, 2018

alyst changed the title ~~add comparison test with python affinity propergation~~ affinity propagation: add comparison test with sklearn implementation Nov 19, 2018

compare label using rand index

1fe1f3f

make test code more formal

d6dbafc

alyst merged commit da391f4 into JuliaStats:master Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

affinity propagation: add comparison test with sklearn implementation #132

affinity propagation: add comparison test with sklearn implementation #132

xiuliren commented Nov 19, 2018

xiuliren commented Nov 19, 2018 •

edited

Loading

alyst commented Nov 19, 2018 •

edited

Loading

xiuliren commented Nov 20, 2018 •

edited

Loading

alyst commented Nov 20, 2018

xiuliren commented Nov 20, 2018

affinity propagation: add comparison test with sklearn implementation #132

affinity propagation: add comparison test with sklearn implementation #132

Conversation

xiuliren commented Nov 19, 2018

xiuliren commented Nov 19, 2018 • edited Loading

alyst commented Nov 19, 2018 • edited Loading

xiuliren commented Nov 20, 2018 • edited Loading

alyst commented Nov 20, 2018

xiuliren commented Nov 20, 2018

xiuliren commented Nov 19, 2018 •

edited

Loading

alyst commented Nov 19, 2018 •

edited

Loading

xiuliren commented Nov 20, 2018 •

edited

Loading